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Overview 


•  WAIS  —  Wide  Area  Information  Servers 

•  Project  goals 

•  WAIS  system  architecture 

•  Connection  Machine  Document  Retrieval  System 

•  Relevance  Feedback 

•  Parallel  text  retrieval  algorithms 

•  Performance 

•  Future  Systems 
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Levels  of  Information 


•  Personal  files 

•  Workgroup  file  server 
•  Division  database 

•  Corporate/Organization  database 
•  Public  databases 


Goal:  Access  all  levels  from  one  interface 
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Wide  Area  Information  Server 

Architecture 


DowJones 


Directory  of 
Servers 


Gateways 
other  nets 


TV  Guide 
etc. 


Z39.50  over 

X.25,  TCP/IP,  Modem 
Open  Interconnection 
Public  Protocol 


LAN  Server 


Z39.50 
over 
LAN 
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Users  Needs: 
Automatically  Selecting  Servers 
Answering  Questions 
Organizing  Responses 


Architecture  Issues: 
Scalability 
Security 

Business  model  for  servers 
Reliable  Access 
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WAIS  Clients 


•  Provide  easy  access  to  multiple  sources 

•  Busy  24  hours  a  day  finding  information 

•  Automatical!  learn  user's  preferences 

•  Scours  the  world  (within  a  budget)  to  find  new  sources 

•  Current  implementations  on  PC,  Macintosh,  X 
Windows,  NeXT,  dumb  terminal 
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The  WAIS  Protocol  is  WAIS 


•  Supports  any  search  syntax 

•  Supports  sophisticated  clients  —  puts  intelligence 
in  the  user's  hands 

•  Clients  can  run  on  any  platform 

•  Multiple  servers  in  a  single  search 

•  Retrieve  any  kind  of  data:  text,  graphics,  video,... 
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Connection  Machine 
CM2a 


WAIS  via  X-Windows 
or  GMACS 
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Connection  Machine  Server 


•  Interactive  full-text  retrieval  with  large  queries 

•  1  -25  GBytes  current  CM-2  product 

•  Terabytes  on  future  CM-5  system 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

•  Uses  words  and  phrases  in  question  to  find 
appropriate  documents 
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Why  a  Connection  Machine? 


•  Bigger  databases 

•  Interactive  full-text  search  on  gigabytes  to 
terabytes 

•  More  robust  search  techniques,  e.g.  relevance 
feedback,  weighted  terms 
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Boolean  Search 


Retrieve  documents 
containing  specific 
combinations  of 
words 


Conceptual  Search 


Explore  a  set  of 
documents 
containing  related 
concepts 
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Hard  to  Use: 
Complex  Syntax 


Poor  Results: 

The  wrong  information 

No  ranking  of  results 
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(Japanese  OR  Japan)  AND 

(building  OR  buildings  OR  (Real  AND  Estate)  AND 

(Manhattan  OR  (New  AND  York) 


Have  you  been  paying  attention?... 
Freer  Finance:  U.S.  Regulators  Move... 
REAL  ESTATE:  California  Initiatives- 
First  Boston  Said  To  Agree  on  Sale  Of... 
Exxon,  Rockefeller  Group  to  Sell  Site... 
What's  News-Business  and  Finance 
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Easy  to  Use: 
No  Syntax 


Mapanese  buying  real  estate  in  mid-town  manhattan 


Options: 

What  do  you 
want  to  follow 
up? 


1 .  Time  Acts  to  Cut  Magazine  Costs... 

2.  First  Boston  Said  To  Agree  on  Sale... 

3.  Have  You  Been  Paying  Attention? 

4.  Exxon,  Rockefeller  Group  to  Sell  Site... 

5.  Hard  Sell:  Real  Estate  Developers... 

6.  What's  News-Business  and  Finance... 

7.  Integrated  Resources  Buys  Loft  Building... 
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Relevance 
Feedback: 

I  like  these; 
show  me  more 


First  Boston  Said  To  Agree  on  Sale... 
Exxon,  Rockefeller  Group  to  Sell  Site... 


Improved  results: 

Articles  on  related 
topics  are  found 

Results  are  ranked 


1.  Bids  for  Exxon  Building  in  New  York... 

2.  Time  Acts  to  Cut  Magazine  Costs... 

3.  Hard  Sell:  Real  Estate  Developers... 

4.  Time  Inc.  Sells  Its  45%  Interest... 

5.  Citicorp  Unit  Moves  to  Foreclose  on... 

6.  Litigious  Landlords:  Legal  Maneuvers 
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Query  Broadcast  To  Database 
on  Connection  Machine  System 


Document  Units 


Scores 


Tripoli 


Libyan 


PLO 


bomb 


75 


42 


80 


52 
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User 
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number  of  query 
terms 
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•    Current  algorithm  limits: 
~2GBwith  512MB  CM-2 
~8  GB  with  2  GB  CM-2 
-25  GB  with  8  GB  CM-2 


High  recall        1         Stanfill  and  Kahle 
High  precision    J  see  Communications  of  the  ACM 

December  1986 

«  1  sec.  response 


Much  larger  DBs  searchable  with  CM-5 

and  inverted  index  algorithms:  100s  to  1000s  of  Gigabytes 
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•  An  advanced  information  retrieval  service  offered  by  Dow  Jones 
News/Retrieval  since  January  1989 

•  Simple  and  powerful  search  by  example  model. 

•  Prime  the  system  with  a  few  words  to  find  an  article  you  like. 

•  Search  again  using  good  article:  "Give  me  more  articles  like  that." 

•  The  full  text  database  of  over  400  publications  is  examined  and 
compared  with  the  reference  article. 

•  16  top  scoring  "best  fit"  articles  retrieved  almost  instantly. 

•  Process  is  repeated  until  you  find  just  the  information  you  want. 
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